Let’s look at an example diamonds data that comes with the ggplot2 package. But first, let’s load the the package to our session using the library() function. If you have not yet installed the ggplot2 package, you should do this first (you only have to do this once). Recall the magic command is: install.packages(““)
Some information about the diamonds dataset :
Load the diamonds data set, get the dimensions, and look at the first few lines
data(diamonds, package="ggplot2")
dim(diamonds)
# [1] 53940 10
head(diamonds)
# # A tibble: 6 × 10
# carat cut color clarity depth table price x y z
# <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
# 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
# 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
# 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
# 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
# 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
Recall using the default settings with the plot() function
plot(diamonds$carat,diamonds$price) # x-variable first in this notation
Here is a caption.
# or
plot(price~carat, data=diamonds) # an alternative way: this is y against x
Here is a caption.
Note: If you want to change the figure size and add a figure caption in Rmd, you can specify the fig.height, fig.width, fig.cap options inside curly brackets at the beginning of your R code chunk. See the full set of options here Bonus: If you don’t want to set the size each time you generate a plot, you can insert the options, e.g., fig_width fig_height in your YAML chunk at the beginning of your Rmd file.
Here comes ggplot! Using the default settings in ggplot()
theme_set(theme_bw()) ##set b+w color
library(ggplot2) ## source of the plot
ggplot(diamonds, aes(x=carat,y=price)) +
geom_point() +
labs(y = 'price $', x = 'carat value') ##details
ggplot(diamonds, aes(x=carat,y=price)) + geom_point()
The first component (before the “+”) calls the ggplot function, and the data with x-y varibles. The second component (after the first “+”) tells ggplot what type of plot you want, e.g., geom_bar/geom_hist/geom_boxplot * Possible to add other lines for more customization on the plot, e.g., title, label for the axes, etc. ** https://rkabacoff.github.io/datavis/ a whole book on ggplot2 customization!!
Very easy to switch to a boxplot. We can use geom_boxplot to create boxplots when one variable is continuous and the other is a factor.
ggplot(diamonds, aes(x=cut,y=price)) + geom_boxplot()
You can control the aesthetics of each layer, e.g. colour, size, shape, alpha (opacity) etc. https://ggplot2.tidyverse.org/reference/geom_point.html
ggplot(diamonds, aes(carat, price)) + geom_point(col = "blue")
Changing the alpha level
ggplot(diamonds, aes(x=carat,y=price)) + geom_point(alpha = 0.2)
Changing the point size
ggplot(diamonds, aes(x=carat,y=price)) + geom_point(size = 0.2)
Changing the shape and the point size
ggplot(diamonds, aes(x=carat,y=price)) + geom_point(shape = 2,size=0.4)
The real power of ggplot is its ability to combine layers
ggplot(diamonds, aes(x=carat,y=price)) + geom_point(size = 0.2) +
geom_smooth()
In this case (and many other situations) a log transformation may allow for the relationships between variables to be clearer. Can use coord_trans()
ggplot(diamonds, aes(carat, price)) + geom_point(size = 0.5) +
coord_trans(x = "log10", y = "log10")
We can color by a factor variable (not that it’s useful for this example!)
ggplot(diamonds, aes(carat, price, colour=color)) + geom_point() +
coord_trans(x = "log10", y = "log10")
Can also color by a continuous variable (not really useful for this example too, but here it is so you are familiar with the syntax:)
ggplot(diamonds, aes(carat, price, colour=depth)) + geom_point() +
coord_trans(x = "log10", y = "log10")
In some cases, it may be more useful to get separate plots for each category of the third variable, to understand conditional relationships
ggplot(diamonds, aes(carat, price)) + geom_point() +
facet_wrap(~color, ncol=4)
Alternatively, you can use facet_grid, which also allows more than 1 conditioning variable (tables of plots)
ggplot(diamonds, aes(carat, price)) + geom_point() +
facet_grid(~color, labeller=label_both)
There are actually many ways to get the same plot! The following commands will produce the same plot:
ggplot(diamonds) + geom_point(aes(price, carat))
Let’s make a histogram.
ggplot(diamonds, aes(depth)) + geom_histogram()
Notice the difference in the aes call; boxplot is really designed for multiple categories!
Tthe default options in histogram may not be sensible, and you often need to adjust the binwidth and xlim
ggplot(diamonds, aes(depth)) + geom_histogram(binwidth=0.2) + xlim(56,67)
A better use of boxplot is when we want to compare distributions of a quantitative variable across categories of a factor variable, as previously discussed
ggplot(diamonds, aes(cut, depth)) + geom_boxplot()
We can also get multiple histograms, though we need to either display them separately (less useful when comparing)
ggplot(diamonds, aes(depth)) + geom_histogram(binwidth = 0.2) +
facet_wrap(~cut) + xlim(56, 67)
Or, you can overlay the historgrams
ggplot(diamonds, aes(depth, fill=cut)) +
geom_histogram(binwidth=0.2,colour="grey50",alpha=.4,position="identity") + xlim(56,67)
We are covering only a few of the many plot types that can be greated with the ggplot2 package.
For a more comprehensive view of ggplot2, take a look at the ggplot2 Cheat sheet
#install.packages('maps') # you only need to do this once. maps package includes various maps that we can use.
#install.packages('sf') # you only need to do this once
library(maps) # Provides latitude and longitude data for various maps
library(sf)
# read the state population data
MainStates <- map_data("state")
#plot all states with ggplot2, using black borders and light blue fill
ggplot() +
geom_polygon( data=MainStates, aes(x=long, y=lat, group=group),
color="black", fill="lightblue" ) +
coord_sf(crs = st_crs(4326)) # projection